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1.         Introduction 

Nonlinear  regression  models  with  measurement  error  are  important  but  difficult  to 
estimate.     Measurement  error  is  a  common  problem  in  microeconomic  data,   where  nonlinear 
(e.g.   discrete  choice)  models  are  often  of  interest.     Instrumental  variables  estimators 
are  not  consistent  for  these  models,  as  discussed  in  Amemiya  (1985),  so  that  alternative 
approaches  must  be  adopted.     The  purpose  of  this  paper  is  to  develop  cin  approach  based  on 
a  prediction  equation  for  the  true  variable,  that  uses  simulation  to  simplify 
computation.     The  approach  allows  for  flexibility  in  the  distribution  being  simulated, 
and  could  be  used  for  simulation  estimation  of  other  models. 

The  measurement  error  model  considered  here  is  the  prediction  model  analyzed  in 
Hausman,   Ichimura,  Newey,  and  Powell  (1991)  and  Hausman,  Newey,   and  Powell  (1993).     This 
model  has  a  prediction  equation  for  the  true  regressor  with  a  disturbance  that  is 
independent  of  the  predictors.     This  previous  work  shows  how  to  consistently  estimate 
polynomial  regression  models,  and  general  regression  models  via  polynomial 
approximations.     This  paper  avoids  polynomial  approximation  by  working  directly  with 
certain  integrals,   estimating  them  by  simulation  methods.     Other  work  relies  on  the 
assumption  that  the  variance  of  the  measurement  error  shrinks  with  sample  size,  including 
Wolter  and  Fuller  (1982)  and  Y.   Amemiya  (1985).     This  approach  is  applicable  when  there 
are  a  large  number  of  measurements  on  true  regressors,  but  this  situation  does  not  occur 
often  in  econometric  practice. 

Flexibility  in  distribution  of  the  prediction  error  is  desirable,  because 
consistency  of  the  estimator  depends  on  correct  specification.     It  is  also  important  to 
meinage  computation  costs,  so  that  the  estimator  is  feasible  for  a  variety  of  regression 
models.     These  goals  are  accomplished  by  combining  simulated  moment  estimation  with  a 
linear  in  parameters  specification  for  distributional  shape.     Simulated  moment  estimation 
provides  a  convenient  approach  when  estimation  equations  are  integrals,  e.g.  Lerman  and 
Manski  (1981),  Pakes  (1986),   and  McFadden  (1989).     This  approach  uses  Monte  Carlo  methods 
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to  form  an  unbiased  estimators  of  integrals  in  moment  equations.     Flexibility  in 
distribution  is  incorporated  by  multiplying  by  a  linear  in  parameters  function  that 
approximates  the  ratio  of  the  true  density  to  the  simulated  one.     This  approach  is 
similar  to  the  importance  sampling  technique  from  the  simulation  literature. 

Section  2  describes  the  errors  in  variables  model  and  some  of  its  implications  for 
conditional  moments.      Section  3  lays  out  the  estimation  method  and  discusses  parametric 
asymptotic  inference  for  the  estimator.     Section  4  gives  a  semiparametric  consistency 
result.     Section  5  presents  results  of  a  small  Monte  Carlo  study.      Section  6  describes  an 
empirical  example  of  Engle  curve  estimation  of  the  relationship  between  income  and 
consumption. 


The  Model 


The  model  considered  here  is 


(2.1)  y  =  f(w  .5q)   +  C.  EIClx.v]   =  0. 


w  -  w     +  T),  E(t)|x,v,(^]   =  0, 


w     =  n'x  +  (TV,  V     independent  of     x. 


where     y     and     C,     are  scalars,     S   ,     w  ,     w,     t),     x,     and     v     are  vectors,     and     ti       and 
cr       are  conformable  matrices.     Here     w       represents  true  regressors,     t)     measurement 
errors,   and     w     observed  regressors.     The     x     are  observed  and     w  ,   i!;,   tj,     and     v     may  be 
unobserved.     The  last  equation  is  a  prediction  equation  for  the  true  regressors,   where     x 
are  observed  predictor  variables,      v     is  an  unobserved  prediction  error,     and     <r       is  a 
scaling  matrix,   a  square  root  of  a  variance  matrix.     Some  of  the  true  regressors  can  be 
allowed  to  be  observed,   with     w     equal  to  an  element  of     x,     by  specifying  that 


corresponding  elements  of     t)     cind     a-  v    are  identically  zero,  and  the  corresponding 

element  of     tt'x     is     w     =  w.     This  model  was  considered  by  Hausman,   Ichimura,  Newey,  and 

•  ■ 
Powell  (1991)  (HINP  henceforth),  for  the  special  case  where  f(w  ,5)     is  a  polynomial  in 

» 
w  .     As  long  as     x     includes  a  constant,  the  location  and  scale  of     v     can  be  normalized, 

e.g.   as     E[v]  =  0     and     Var(v)  =  I     when  the  second  moment  of     v     exists. 

Instrumental  variables  (IV)  estimators  can  be  used  to  estimate  this  model  when 
f(w  ,5)     is  linear  in     w  .     Substituting     w-tj     for     w       in  the  first  equation  leads  to     x 
be  valid  instruments,  because  the  disturbaince  is  linear  in  the  measurement  error     t).     In 
the  nonlinear  case  this  substition  leads  to  residuals  that  are  nonlinear  in     tj. 
Consequently,     x     will  not  be  valid  instruments,  and  another  approach  has  to  be  adopted. 

An  approach  to  consistent  estimation  can  be  based  on  integrating  out  the  prediction 
error.     Let     gp>(v)     be  the  density  of     v.     Integrating  leads  to  three  equations: 

(2.3a)  E[y|x]  =  /f(7r^x  +  o-^v,   5Q)gQ(v)dv, 

(2.3b)  E[w.y|xl  =  Sin^x  +  cr^v]f(n'^x  +  cr^v,   d^)g^{v)dv, 

(2.3c)  E[w|x]  =  ir^x. 


The  first  is  a  regression  of     y     on     x,     analogous  to  the  usual  one,  except  that  the 
unobserved  variable     v     has  been  integrated  out.     The  second  equation  is  a  regression  of 
wy     on     X,     that  is  less  familiar.     The  third  equation  is  a  standard  regression 
equation. 

The  second  equation  is  important  for  identification  of  nonlinear  models.     The 
components  of  this  equation  correponding  to  unobserved     w       (i.e.   those  not  corresponding 
to  observed  covciriates)  provide  information  additional  to  the  first  equation.     As  shown 
in  HINP  for  polynomial  regression,  the  first  equation  does  not  suffice  for 
identification.     Intuitively,  there  are  two  functions  that  need  to  be  identified,  the 
regression  function  and  the  density  of     v,     so  that  two  equations  are  needed  for 
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identification.  It  was  shown  in  HINP  that  the  parameters  of  any  polynomial  regression 
equation  are  identified  from  these  two  equations,  sind  one  expects  that  identification  of 
the  regression  parameters  will  hold  more  generally. 

It  is  beyond  the  scope  of  this  paper  to  develop  fully  primitive  identification 
conditions  for  this  model,   but  some  things  can  be  said.     First,   the  parameters     ti       are 
identified  from  (2.3c)  as  the  coefficients  of  a  least  squares  regression  of     w     on     x, 
so     n       can  be  treated  as  known  and  identification  of  the  other  pieces  of  the  model 
considered  by  focusing  on  equation  (2.3a)  and  (2.3b).     If     ir'x     has  a  discrete 
distribution  with  a  finite  support  and     m     points  of  positive  probability,   then  (2.3  a  - 
b)  provide     2m  equations.     Assuming  that  none  are  redundant,   i.e.   that  a  "rank  condition" 
holds,   one  could  identify     2m     paraimeters  from  these  equations,    including     6     and 
parameters  of  a  parametric  family  of  distributions  for     v.     For  example,   HINP  showed 
that,   in  the  case  where     w       is  a  scalar  and     f(w  ,5)     is  a  polynomial  of  degree     p,     the 
5     parameters  are  identified  if  the  second  moment  matrix  of     (I.ti' x,...,(7i' x)       )'      is 
nonsingular.      Also,   some  of  the  moments  of     v     are  identified  in  this  case.      If     ti' x     has 
a  continuous  distribution,   then  a  simple  counting  airgument  suggests  that     f(w  ,5   )     and 
g   (v)     should  be  identified.     Assuming  that  the  left  hand  sides  of  (2.3a)  and  (2.3b)  are 
distinct  functions,   these  equations  give  two  functional  equations,   and  there  are  two 
functions  to  be  identified.     So,   by  an  analogy  with  the  finite  dimensional  case,   it 
should  be  possible,   under  appropriate  regularity  conditions,   to  identify  both  the 
regression  function  for     w       and  the  density  function  for     v.      Making  this  intuition 
precise  would  be  quite  difficult,   because  of  the  nonlinear,   nonparametric  (i.e. 
functional)  nature  of  these  equations,   but  it  is  cin  important  problem  deserving  of  future 
attention. 

Independence  of     x     and     v     is  a  strong  assumption,   but  in  the  general  nonlinear 
model  of  equation  (2.1)  it  is  difficult  to  drop  this  assumption.     Intuitively,   if  some 
moments  of     v     can  depend  on     x,     then  it  is  much  more  difficult  to  separate  the 
regression  function  from  the  distribution. 
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3.       Estimation 

To  describe  the  estimator  it  is  helpful  to  embed  the  model  in  a  more  general 

conditional  moment  setup.     Let     z     denote  a  data  observation,     p     a  q  x  1     vector  of 

parameters,     g     a  density  function  of  a  random  vector     v,     p(z,p,g)  a     r  x  1     residual 
vector,  and     H(z,p,v)     a     r  x  1     vector  of  functions,  related  as  in 

(3.1)  p(z,p.g)  =  J-H(z,p,v)g(v)dv. 


Suppose  that  there  is  a  set  of  conditioning  variables     x     such  that  for  the  true 
parameter  value     p       and  density     g  , 

(3.2)  E[p(z,^Q,gQ)|x]  =  0. 


The  nonlinear  errors-in-variables  model  is  a  special  case  of  this  one,  where 


(3.3)  H^(z,p,v)  =  y  -  fdr'x-KTV.S), 


H  (z,3,v)  =  Uw.y  -  [Tr'x+(rvl«f(n'x+(rv,6)), 


H2(z,P,v)  =  w  -  tt'x,     (3  =  {d'.a-.n'V. 


and     L     is  a  selection  matrix  that  picks  out  those  elements  of     w     that  include 
measurement  error. 

The  common  approach  to  using  equation  (3.2)  in  estimation  is  nonlinear  instrumental 
variables.     One  difficulty  with  this  approach  is  that  the  density     g(v)     is  unknown. 
Another  difficulty  is  that  the  residual  is  ein  integral  that  may  be  difficult  to 
compute.     Here,  these  difficulties  are  dealt  with  simultaneously,  by  choosing  a 
flexible  parameterization  for  the  density  that  makes  it  easy  to  use  a  simulation 
estimator  of  the  integral.     To  describe  this  approach,  we  begin  with  a  specification  of 
the  density  function. 
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For  now,  suppose  that  the  density  is  a  member  of  a  parametric  family,  of  the  form 

(3.4)  g(v,r)  =  P(v,yMv),     P(v.r)  =  Zj^^yjPj(v).  ■     ■ 

where     ^(v)     is  some  fixed  density  function.     For  example,   if     ^(v)     were  standard  normal 
and     p.(v)  =  V      ,     then  this  would  be  an  Edgeworth  approximation.     The  function     g(v,y) 
need  not  be  positive,   but  leads  to  residuals  that  are  linear  in  the  shape  parameters     y 
and  that  can  easily  be  estimated  by  simulation. 

For  a  density  like  that  of  equation  (3.4),   a  simulated  residual  can  be  constructed 
by  drawing  random  variables  from     y(v)     and  then  evaluating  the  product  of  the  linear 
combination     P(v,y)     and  the     H     functions.      Let     z.     denote  a  single  observation  and 
[v.  ,...,v.    ]     denote  a  vector  of  random  variables,   each  having  marginal  density     ip(.v). 

For  example,   if     <p{v)     is  a  standard  normal  pdf,   then     [v.  v.    ]     could  be  computer 

generated  Gaussian  random  numbers.     Then  an  estimator  of  the  residual     p{z.,P,g{y))     for 
the     i         observation  is 

(3.5)  pAe)  =  s~V  ^,H(z.,3,v.  )P(v.  ,r),    e  =  O'.y')'. 

1  ^  =  1  1  IS  is 

This  is  essentially  an  importamce  sampling  estimator  of  the  residual,  where  ^(v)  is 
sampling  density  and  P(v,y)  approximates  g(v)/^(v).  The  simulated  residual  is  an 
unbiased  estimator  of  the  true  residual,   because 


E[p.(e)|z.]  =  p(z.,p,g(y)). 


Therefore,  by  the  results  of  McFadden  (1989)  and  Pakes  and  Pollard  (1989),   an 

instrumental  variables  (IV)  estimator  with     p.O)     as  the  residual  should  be  consistent 

1 

if  the  IV  estimator  with  the  true  residual  is.     An  IV  estimator  can  be  formed  in  a 
familiar  way.     Let     A(x)     denote  a     q  x  r     vector  of  instrumental  variables,   that  may  be 
estimated.     Suppose  that     9     solves 
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(3.6)  n  ^Ij"jA(x.)3.(e)  =  0. 


This  is  a  simulated,  nonlinear  IV  estimator  like  that  of  McFadden  (1989). 

Because  equation  (3.6)  is  linear  in     P{v,9r),     it  is  important  to  normalize  the 
density     P(v,y)^(v)     to  integrate  to  one.     Also,  it  may  be  important  to  impose  a  location 
and  scale  normalization  on  this  density.     There  are  different  ways  to  impose 
normalizations  by  imposing  constraints  on  the  coefficients.     For  example,  if     ^(v)     is 

the  standard  normal  density  and     pJv),  p„(v),   ...     cu'e  the  Hermite  polynomials  that  are 

2 
orthonormal  with  respect  to  the  standard  normal  density  (i.e.     Jp.(v)  <p{v)dv  =  1     and 

J'p.(v)p,  (v)^(v)dv  =  0     for     j  *  k),     then     y    =1,     9'^  =  0.     and     y     =  0     will  imply  that 

P(v,y)v(v)     integrates  to  one,  and  has  zero  mean  and  unit  variance.     It  is  also  possible 

to  impose  such  constraints  using  the  simulated  values,  by  requiring  that 

I-",L^,(l.v.  ,v?  )P(v.  ,r)  =  0. 

^1=1^=1  is      IS  IS 

In  the  nonlinear  errors  in  variables  model,  it  is  convenient  to  work  with  a  two  step 
estimator,  where  the  first  step  consists  of  estimation  of     tt     by  least  squares  (LS),  and 
the  second  step  is  an  instrumental  variables  estimator  using  the  first  two  residuals  of 
equation  (3.3).     The  first  order  conditions  for  such  cin  estimator  can  be  formulated  as  a 
solution  to  equation  (3.6),   if  A(x)     is  chosen  in  a  particular  fashion.     Let     tt     be  the 
LS  estimator  and 

(3.7)  A(x)  =  diag[B(x),x], 


where     B(x)     has  two  columns  and  number  of  rows  equal  to  the  number  of  elements  of 
(.8' ,<r,Tf' ).     Suppose  that  constraints  are  imposed  on  the     y     coefficients  such  that 
y  _.P(v.  ,3r)  =  0.     Then  the  solution  to  equation  (3.6),  with     A(x)     specified  as  in 
equation  (3.7),  requires  that     ir     be  the  least  squares  estimator,  and  that  the  other 
paremieters  solve  the  equation 
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(3.8) 


n  A 


0  =  j:.;:^B(x.)p.(a),     a  =  iS' ,a;r' )' 


p.(a)  = 
1 


y; 


Lw.y. 


ri 


s=l 


f(Tr'x. -HTV.     ,5) 
1  1  s 

L(7i'x.-Hrv.    )f  (ti' x.+o-v.  ,6) 

lis  1  IS 


P(v.  ,r) 

IS 


In  the  empirical  example  the  estimator  minimizes  a  quadratic  form  that  has  this  type  of 
equation  as  its  first  order  condition,   although  the  normalization     T  -i^'^-  '^^  =  0     is 
not  imposed.     Specifically,   for     C(x)     equal  to  a  vector  of  instrumental  variables  and     W 
a  positive  definite  matrix,     a     solves 


n  A, 


(3.9)  min       [r.",C(x.)p.(a)]' W[j:.",C(x.)p.(a)] 

a      ^1=1       1     1  ^1=1       1     1 


The  first  order  conditions  to  this  minimization  problem  are  as  given  in  equation  (3.8), 
with 


n 


(3.10)  B(x)  =   [ar.    ,C(x.)p.(a)/aa)'W. 

^1=1       1     1 


Standard  large  sample  theory  for  IV  can  be  used  for  asymptotic  inference  procedures. 

If  the  simulated  values     (v.,,...,v._)     are  included  with  the  data  to  form  an  augmented 

il  iS 

observation  for  the     i         data  point,  then  the  usual  IV  formulae  can  be  used  to  form  a 
consistent  variance  estimator.     For  example,   suppose  that     (z.,v.  ,...,v.   )     are 
independent  observations  as     i     vairies.     Then  under  standard  regularity  conditions  (e.g. 
see  Newey  and  McFadden,    1994),   the  asymptotic  variance  of     ^11(8   -   6    )     can  be  estimated 
by 

(3.11)         V  =  G'^nc'^' ,    G  =  n"V.",A(x.)ap.(e)/ae,    n  =  n"V.",A(x.)p.(e)p.(e)'A(x.)'. 

^1=1      1     1  ^1=1      111  1 


If  the  simulation  draws     v.       are  mutually  statistically  independent  as     s     varies  for  a 
given     i,     then  one  could  also  use  the  estimator 
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(3.12)  V  =  G"^nG  ^',     n  =  n~^j;."^A(x.)A.A(x.)' , 

A.  =  s"t^,H(z.,p,v.  )H(z.,p,v.  )'P(v.  ,y)^. 

1  *^=1  1  IS  1  IS  IS 


Both  of  these  variance  estimators  ignore  estimation  of  the  instruments,  which  is  valid 
under  standard  regularity  conditions.     Because  the  large  sample  theory  for  these 
estimators  is  straightforward,  we  do  not  give  regulcirity  conditions  here. 


Consistent  Semiparametric  Estimation 


If  the  functional  form  of  the  density     gf^(v)     is  left  unspecified,  then  the 
model  becomes  semiparametric.     Models  where  identification  is  acheived  by  conditional 
moment  restrictions  like  those  of  equation  (3.1)  are  nonlinear,  nonparametric 
simultaneous  equations  models.     Newey  and  Powell  (1989)  have  considered  estimation  of 
such  models,  and  their  result  can  be  applied  here.     The  basic  idea  is  to  apply  the 
previous  estimator,  but  with     P(v,y)     chosen  to  be  a  member  of  an  increasing  sequence  of 
approximating  families  and  the  IV  equation  (3.6)  replaced  by  a  nonparametric  conditional 
expectation  equation. 

Let     &     be  a  set  of  functions  of     v     that  will  be  assumed  to  include  the  true 
density     gf>(v)     and  satisfy  other  regularity  conditions  given  below.     Also,  let 
{P  (v.y)}  be  a  sequence  of  families  such  that     P.(v,3')y(v)     can  approximate  any 

function.     Let     &.  =  <P.(v,y)y(v)}n§',     9  =  0,g)     be  the  peirameter  consisting  of  the 
Euclidean  vector     p     and  a  density     g,     and     p.(0)     be  the  simulated  residual  of  equation 
(3.5)  with     Pj(v,9f)     replacing     P(v,9r).     Let     E[p.(e)|x.]     be  a  nonparametric  estimator 
of  the  conditional  expectation  of     p. (6)     given     x.,     such  as  a  series  or  kernel 
estimator.     Then  a  minimum  distamce  estimator  of     9     =  0-,g^)     is 
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(4.1)  e  =  argmin^   „   „    Q(e);     Q(e)  =  j;."  E[p.(e)  |  x.]' DEl^.O)  |x.l/n. 

oCioXj';  1—1  11  11 

n 

where     D     is  a  positive  definite  matrix  and     J       CEin  depend  on  the  data  and  on  sample 

size.     The  objective  function  in  equation  (4.1)   is  a  sample  analog  of 

Q(6)  =  E[E(p(z.0)|x]'DE[p(z.e)|x]], 

where     D     is  the  limit  of     D     and     p(z,9)  =  jH(z,P.v)g(v)dv.     If     D     is  positive  definite 

and     e       is  identified  from  the  conditional  moment  equation  (3.2)   (i.e.   that  equation  has 

a  unique  solution),     then     Q(e)     will  have  a  unique  minimum  of  zero  at     8    .     The  general 

extremum  estimator  reasoning  (e.g.    Newey  and  McFadden,   1994)  then  suggests  that     0 

should  be  consistent. 

The  estimator  can  be  shown  to  be  consistent  if     p     and     g     are  restricted  to  compact 

sets,   similarly  to  Gallant   (1987).     The  compact  function  set  assumption  is  a  strong  one, 

but  the  results  of  Newey  and  Powell  (1989)   indicate  its  importcince  for  minimum  distance 

estimators  of  the  form  considered  here.     For  a  matrix     A  =  [a..],     let     IIAII   = 

ij 

1/2 
[trace(A'A)]       ,     and  for  a  function     g(v)     let      llgll      denote  a  function  norm,   to  be 

further  discussed  below. 


Assumption  4.1:     p     6  S,     which  is  compact,   and     g     e  ^,     a  compact  set  in  a  norm     llgl 


In  the  primitive  regularity  conditions  given  below,      llgll     will  be  a  Sobolev  norm. 

The  following  dominance  condition  will  be  useful   in  showing  uniform  convergence. 

Assumption  4.2:     There  exists     M(z,v)     such  that  for     0,  g  €  B,      IIH(z,^,v)ll   s 
M(z,v),      IIH(z,p.v)-H(z,^.v)ll   <  M(z,v)llp-pll. 

Moment  conditions  for  the  dominating  function     M(z,v)     will  be  specified  below. 

To  show  uniform  convergence  of  the  objective  function  of  equation  (4.1),   it  is 
useful  to  impose  a  strong  condition  on  function  norm,  that  it  dominates  a  weighted 
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supremum  norm.     Let     V     denote  the  support  of     <p{v),     and 


llglL,       =  sup,,|g(v)|tj(v).     cj(v)  >  0. 

V,W  V 


Below  it  will  be  assumed  that     UglU,        is  dominated  by     llgll,     so  that  Assumption  4.1 

V,(*) 

implies  that     "gUy         is  bounded  on     &.     The  import  of  this  assumption  is  a  uniform  bound 
on  the  tail  behavior  of     g(v),     imposed  by  the  presence  of  the  weight  function;  the 
faster     w(v)     grows  as     v     moves  outwcird,     the  faster  the  tails  of     g(v)     must  go  to  zero 
in  order  to  guarantee  that     sup^<  |g(v)|tj(v)}     is  finite.     Also,  the  nature  of  importance 
sampling  imposes  a  restriction  on  the  tail  thickness  of  the  true  density  relative  to  the 
baseline  density.     For  second  moment  dominance  this  restriction  will  translate  into  a 
restriction  on     w(v)     relative  to     <p(v).     These  considerations  lead  to  the  following 
assumption: 

Assumption  4.3:      HgH,,      ^  HgH     for     g  e  ^     and  there  exists     e  >  0     such  that 

V,£J 

ElX^VKz.v)       lu(v)(p(v)]        u(v)    dv]  <  oo. 

In  order  to  guarantee  that  the  parametric  approximation  suffices  for  consistency,  the 
following  denseness  condition  will  be  imposed. 

Assumption  4.4:     For  any     g  e  ^     and     J     there  exists     P.{v,f)(p(v)  e  &.     such  that 
limj_^IIPj(.,y)«>  -  gll  =  0. 

This  condition  specifies  that     g       can  be  approximated  by  the  family. 

It  is  necessciry  to  make  some  assumption  concerning  the  conditional  expectations 
estimator.     The  following  condition  is  lifted  from  Newey  and  Powell  (1989).     Without 
changing  notation  assume  that  the  data  observation     z.     includes  the  simulation  draws 
(v.  ,...,v.   ).     Assume  that  the  data  are  stationary. 
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Assumption  4.5:     For     e  >  0     from  Assumption  4.3,   i)  D  — ^  D,     D     is  positive  definite, 
and  if     E[|^(z.)|^*^''^]  <  oo     then     ^.^i^^^i'^"  "^  E[«//(2.)l;     ii)  if  Elf/ztz)^""^]     is 

finite,     T.",  IIEl^tz)  |x.]  -  E[.//(z)  |x.lll^/n  -^  0;     iii)  either     a)  E[i/»(z)|x.]  = 
^1=1  11  1 

r.",w.  .i/»(z.).     w..  i  0.     I.",w..  =  1,     (s,t=l n).   and  and  if     E[  |i//(z)  I  ^■^^''^l  <  oo, 

J=l    iJ       J  U  J=l    iJ 

r.",E[^(z)|z.]/n  =  0   (1);      or     b)  E[0(z)|z.]  =  P'.  (J^.",?  .P'.j'j]."  ,P  ..//(z.). 
^1=1     ^1  p  ^11  ^j=l    J    J     ^j=l    j"^     J 

Assumption  4.5  can  easily  be  checked  in  some  cases  and  is  quite  general.     For  instsmce, 

if     z.     is  i.i.d.   then  it  is  easy  to  use  known  results  to  show  that  Assumption  4.5  holds 

for  nearest  neighbor  and  series  estimators.     For  K-nearest-neighbor  estimators  with     K  — > 

00,     K/n  — )  0,   ii)  follows  by  Lemma  8  of  Robinson  (1987)  and  Proposition  1  of  Stone 

(1977),   while  iii)  a)   holds  by  construction.      For  a  series  estimator  of  the  form  given  in 

iii)  b),  with  P       containing     K     elements  such  that  any  function  with  finite  mean  square 

can  be  approximated  arbitrarily  well  in  mean-square  for  large  enough     K,      ii)  follows 

from  Lemma  A. 10  of  Newey   (1993a)  and  the  arguments  for  Lemma  A. 11  as  long  as     K  — >  m     and 

Neither  of  these  results  allow  for  data  based     K.      It  should  be  noted  that 
Assumption  4.5  restricts  the  form  of  randomness.      Implicitly  the  form  of  the  weights     w 


st 


in  Assumption  4.5  and  the  approximating  functions     P       are  restricted  to  not  depend  on 
i//.     Thus,  while  they  could  be  chosen  based  on  some  fixed     ip     (e.g.   a  lineair  combination 
of     p(z,e)     for  some  prelimineu-y  estimator     9),     they  are  not  allowed  to  vary  with     \p 
(i.e.   with     9     in     p(z,9)).     Assumption  4.5  should  also  be  "plug-compatible"  with  future 
results  on  nonparametric  conditional  expectation  estimators,   such  as  those  for  time 
series. 

The  last  assumption  specifies  that     J       must  go  to  infinity  with  the  sample  size. 

Assumption  4.6:      J     -^  oo     as     n  — )  oo. 

As  mentioned  eairlier,  the  degree  of  approximation     J     can  be  random,   in  a  very  general 
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way.     However,   it  should  be  noted  that  it  is  not  restrictions  on  the  growth  rate  of     J 
that  are  used  to  obtain  consistency,  but  rather  the  restriction  of  the  function  to  a 
compact  set.     Often,   the  compactness  condition  will  require  that  higher  order  derivatives 
be  uniformly  bounded,  a  condition  that  will  have  more  "bite"  for  large  values  of     J, 
imposing  strong  constraints  on  the  coefficients  of  higher  order  terms. 

These  assumptions  and  identification  deliver  the  following  consistency  result: 

Theorem  4.1:     If     E[p(z,Q)\x]  =  0     has  a  unique  solution  on     Bx'S     at     G_     and  Assumptions 
4.1  -  4.6  are  satisfied,  then     lip-3_ll  -^  0     and     llg-g_ll  -^  0. 

It  should  be  noted  that  the  hypotheses  of  this  theorem  are  not  very  primitive  until  the 
norm     llgll     is  specified.     Once  that  is  specified,   it  may  require  some  work  to  check  the 
the  other  assumptions. 

The  following  set  of  Assumptions  is  sufficient  to  demonstrate  that  the  assumptions 
are  not  vacuous,  and  do  cover  cases  of  some  interest. 

Assumption  4.7:     i)  v     is  one-dimensional;     ii)  There  is  a  compact  interval     V     and  a 
fixed  constant     B     such  that  &  =  {g(v)  :  g(v)  =  0     for     \  i  V,  sup   |g(v)|    ^  B, 
|g(v)-g(v)|    s  B|v-v|      for  all     v,   v  e  Vh     iii)   llgll   =  sup^|g(v)|;     iv)     The  support  of 
(p{v)     is     V     cmd     ^(v)     is  continuous  and  bounded  away  from  zero  on     V;     v)     P(v,y)  = 
I-  qT-V^;     vi)     Elsup^^^M(z,v)        ]  <  00. 

Corollary  4.2:     If  Assumptions  3.1,  4.2,  4.5  -  4.1  are  satisfied,     p     e  S,     satisfied, 
and     £     is  compact  then     IIP-/3qII  -^  0     and     llg-g_ll  -^  0. 

This  result  is  restrictive  in  several  ways.     It  is  easy  to  relax  the  assumption  that     v 
is  one  dimensional,  using  the  results  of  Elbadawi,  Gallant,  and  Souza  (1983).     It  is  more 
difficult  to  allow  for  noncompact  support  for     v,     although  this  extension  is  possible 
using  the  results  of  Gallant  and  Nychka  (1987).     Unfortunately,  their  result  allows  for 
quite  thick  tails,  with     w(v)  =  C(l+v'v)       in  Assumption  4.3.     This  tail  behavior  does 


-  14 


not  allow  Assumption  4.3  to  be  satisfied  when     fiv)     is  the  standard  normal  density.     Of 
course,  there  are  fast  computational  methods  for  generating  data  from  densities 
proportional  to     (l+v'v)     ,     so  that  one  could  easily  use  such  thick-tailed  baseline 
densities.     Also,   it  should  be  possible  to  develop  intermediate  conditions  that  allow  for 
more  general  simulators. 


5.        A  Sampling  Experiment 

A  small  Monte  Carlo  study  is  useful  in  a  rough  check  of  whether  the  estimator  can 
give  satisfactory  results  in  practice.     Consider  the  model 

•  -w 

(5.1)  y  =  6^  +  S^w     +  5^e        +  c,     5^  =  5^  =  1.     C     is     N(O.l), 

» 
w  =  w     +  T),      T)      is     N(0,1), 

w     =  TT    +  71  X  +  V,     n    =  71     =  1,     X     and     v     are     N(0,.5). 

The  regression  equation  for  this  model  is  one  that  is  useful  in  estimating  the 
relationship  between  consumption  and   income.     This  specification  will  be  further 
discussed  in  Section  6,    where  it  is  used  in  the  empirical  example.     The  parameter  values 
were  set  so  that  the  r-squared  for  the  prediction  equation  for     w       was     1/2,     and  so  the 
signal  to  noise  ratio  was     1.     The  number  of  observations  was  set  to  100.     The  number  of 
observations  was  chosen  to  be  small  relative  to  typical  sample  sizes  in  economics,  to 
make  computation  easier.     The  r-squared  for  the  regression  of     w       on     x     was  set  higher 
thcin  typical  in  order  to  offset  the  small  sample  size,   so  that  the  estimator  might  be 
informative. 

Table  One  reports  the  results  from  100  replications.     Results  for  three  different 
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estimators  of     8  ,     5„,     and     6„     are  reported.     The  first  estimator  is  an  ordinary  least 

~w 
squares  (OLS)  regression  of     y     on  the  mismeasured  right-hand  side  variables     (l,w,e      ). 

The  second  estimator  is  an  instrumental  variables  estimator  (IV)  with  the  same  right-hand 

side  but  with  instruments     R.  =  (l,h.,h.,h. )' ,     where     h.  =  n,   +  ir„x..     The  third 

1  111  1  1         2  1 

estimator  is  a  simulated  momemt  estimator  (SM)  from  equation  (3.9),  with     C(x.)  =  I®R. 
and     W  =  I®(J^._  R.R'. )     ,     where  I     is  a  two  dimensional  identity  matrix.     This  estimator 
is  a  system  two-stage  least  squares  estimator  where  the  instrumental  variables  are     R.. 
Also,     P(v,y)     was  a  Hermite  polynomial  of  the  third-order,  where     y    =1,     7^  =  7-  =  0, 
and     7       was  estimated.     There  were  two  simulations  per  observation. 

In  one  replication  out  of  the  100  the  estimator  did  not  converge  to  a  stationary 
point.     This  replication  was  excluded  from  the  results,  that  are  reported  in  Table  One. 
The  estimator  shows  promise.     The  standard  errors  of  the  IV  and  simulated  moment 
estimators  are  much  larger  than  the  OLS  estimator,  but  the  biases  aire  substantially 
smaller.     As  previously  noted  the  IV  estimator  is  inconsistent,  although  in  this  example 
it  leads  to  bias  reduction.     It  is  interesting  to  note  that  the  standard  error  of  the  SM 
estimator  is  smaller  than  that  of  the  IV  estimator.     Thus,   in  this  example  the  valid  SM 
correction  for  measurement  error  leads  to  both  smaller  bias  and  variance  than  the 
inconsistent  IV  correction. 


6.        An  Application  to  Engel  Curve  Estimation 

The  application  presented  here  is  estimation  of  Engel  curves,  a  subject  that  has 
long  been  of  interest  in  econometrics.     Measurement  error  has  recently  been  shown  in 
Hausman,  Newey,  and  Powell  (1993)  to  be  important  in  the  estimation  of  nonlinear  Engel 
curves.     This  section  adds  to  that  work  by  estimating  a  nonlinear,  nonpolynomial  Engle 
curve  for  the  model  of  equation  (2.1),  which  was  not  considered  by  Hausman,  Newey,  and 
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Powell  (1993).     The  functional  form  considered  here  is  that  preferred  by  Leser  (1963), 


»  • 


(5.1)  S.  =  5,   +  5_ln(I.)  +  5_(1/I.)  +  c, 

1         1         2        1  3        1  1 


where     S.      is  the  share  of  expenditure  on  a  commodity  and     I.      is  the  true  total 
expenditure.      As  suggested  by  the  Hausman,   Newey,   and  Powell   (1993)  tests  of  the  Gorman 
(1981)  rank  restriction,   a  rank  two  specification  such  as  this  may  be  a  good 
specification,  once  the  measurement  error  has  been  accounted  for. 

In  addition,   a  specification  is  considered  that  accounts  for  the  presence  of     1. 
in  the  denominator  of  the  left-hand  side  of  this  equation.     This  "denominator  problem" 
results  from  the  fact  that     S.  =  Y./l.,     where     Y.     is  the  expediture  on  the  commodity. 
Thus,   if     1.      is  measured  with  error,   another  nonlinear  measurement  error  problem 
results  from  using  the  measured  shares.     This  problem  can  be  dealt  with  by  bringing     1. 
out  of  the  denominator,   giving 

(5.2)  Y.   =  5,1.    +  6^1.1n(l.)  +  5^  +  I.e.. 

1  1  1  2  1         1  3  11 

■ 

If     e.     satisfies  the  usual  restriction     Elcll.]  =  0,     then  equations  (5.1)  aind   (5.2) 
1  11 

are  equivalent  statistical  specifications,   in  that  running  least  squares  on  either 
equation  should  give  a  consistent  estimator.     Covariates  will  also  be  allowed  in  this 
specification  by  allowing  additional  variables     1.x  .     to  enter  linearly  in  this 
equation,   corresponding  to  inclusion  of     x        as  additional  regressors  in  the  share 
equation  (5.1). 

The  measurement  error  will  be  assumed  to  be  multiplicative,   i.e.   for     1.     equal  to 
the  observed  total  expenditure, 

•  •  •  « 

(5.3)  ln(I.)  =  w.   =  tt'x.  +  v.,     ln(I.)  =  Ind.)  +  t?.  =  w.  =  w.   +  ij.. 

1  1  0   1         1  1  11111 


In  the  empirical  work  the  predictor  vEiriables     x.     will  be  a  constant,  age  and  age 
squared  for  household  head  and  spouse,  aind  dummies  for  educationational  attainment. 
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spouse  employment,  home  ownership,   industry,  occupation,  region,  eind  black  or  white,  a 
total  of  19  variables,   including  the  constant.     With  this  specification  for  the 
measurement  and  prediction  equations,     f(w  ,5)  =  5     +  5„w     +  6  exp(-w  ),     as  in  the 
Monte  Carlo  exeimple. 

The  measurement  error  in  the  left-hand  side  denominator  can  be  accounted  for  as  in 

*  m  m      » 

equation  (5.2),   leading  to  a  specification  with     f(w  ,5)  =  5  exp(w  )  +  6  exp(w  )w     +  6^. 

* 
It  is  interesting  to  note  that  even  if  the  shaire  equation  is  linear  in     ln(I.),     so 

that     5     =  0,     this  equation  is  nonlinear,  so  that  IV  will  not  be  consistent.     Thus, 

measurement  error  in  the  denominator  of  the  shao'e  suggests  the  need  for  the  estimators 

developed  here. 

The  data  used  in  estimation  are  from  the  1982  Consumer  Expenditure  Survey  (CES). 
The  basic  data  we  use  are  total  expenditure  and  expenditure  on  commodity  groups  from  the 
first  quarter  of  1982.     Results  were  obtained  for  four  commodity  groups,  food,   clothing, 
transportation,  and  recreation.     The  number  of  observations  in  the  data  set  is  1321.     The 
empirical  results  were  reported  as  elasticities,   i.e.   dlnf(x)/dlnx,     as  is  common  in 
econometrics.     To  compaire  shapes,  elasticities  were  calculated  at  the  quartiles  of 
observed  expenditure. 

The  results  are  given  in  Tables  Two  through  Five.     Table  Two  gives  some  sample 
statistics,  including  the  quartiles  of  the  income  distribution.     The  other  tables  will 

include  estimated  expenditure  elasticities  at  these  quartiles.     Table  Two  also  gives 

2 
information  on  the  prediction  regression.     The     R       in  this  regression  is     .23,     which  is 

quite  sizeable  for  such  a  cross-section  data  set.     The  other  information  is  useful  in 

calculating  the  magnitude  of  the  measurement  error  and  bounding  the  size  of  the  variance 

of  the  prediction  error     v.     In  particular,  the  model  we  have  assumed  implies  that  the 

standard  error     .45     of  the  residual  is  ain  upper  bound  on  the  standard  deviation  of  both 

the  measurement  error  and  the  variance  of  the  prediction  error     v.     Also,  given  an 

1/2  2 

estimator     a-     of     Var(v)       ,     an  estimator  of  the     R       of  the  measurement  equation, 

that  determines  the  magnitude  of  the  measurement  error  bias  in  a  linear  model,  is 
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Vartrt'x  +  v)/Var(w)  =  [Vardr'x)  +  o-^]/Var(w)  =  [(.25)^  +  ^^]/(.51)^  =  .24  +  (3.8)o^^. 

Tables  Three  to  Five  give  results  for  each  commodity  for  three  different 
specifications  of  the  share  equation  and  four  different  estimators.     Table  Three  gives 
results  for  the  share  equation,  where  measurement  error  in  the  denominator  of  the 
left-hand  side  is  ignored.      This  specification  is  the  same  as  in  the  Monte  Carlo  study. 
Table  Four  changes  the  specification  to  account  for  the  left-haind  side  denominator  by 
multiplying  through  the  original  equation  by  total  expenditure,  as  described  above. 
Table  Five  adds  covciriates     x      to  the  share  equation  to  allow  for  demographic  and 
regional  price  effects.     There  are  six  covariates;   own  and  spouse  age,   family  size,   and 
three  regional  dummy  variables.     The  equation  estimated  is  analogous  to  that  of  Table 

Four  in  accounting  for  the  left-hand  side  denominator,   with     f(w  ,x  ,5)     = 

•      •  • 

5     +  5  exp(w   )w     +   6      +   exp(w  )x  '  6   .      It  should  be  noted  that  this  specification 

restricts  fcimily  size  to  be  absent  from  the  prediction  equation. 

Tables  Three  to  Five  report  results  for  four  different  estimators,   ordinary  least 

squares  (LS),   two  stage  least  squares  (IV)  with  instruments  described  below,   the 

simulated  moment  estimator  with  Gaussian     v     (SMO),   and  the  simulated  moment  estimation 

with  one  Hermite  polynomial  term  (SMI),   of  the  third  order,   included  in  the  moment 

functions.     The  simulated  moment  estimators  are  each  obtained  as  in  equation  (3.9),   with 

p. (a)     as  given   in  equation  (3.8),      10     simulation  draws,   and     W     equal  to  the  inverse  of 

an  estimated  asymptotic  variance  of     ^._  C(x.)p.(a)/v^.     Specifically, 

(5.4)  W  =  f:"\     Z  =  n"V.",U.O'.. 

0.  =  C(x.)p.(i)  +  [5j;.",C(x.)3.{a)/S7i](^.",x.x'.)"^x.(w.-TC'x.). 
1  1  '^i  ^j=l       J  "^j  ^j=l  J    J         11  1 

2 
where     a     is  an  initial  consistent  estimator.       This  is  an  asymptotic  variance 

minimizing  choice  of     W,     that  accounts  for  the  presence  of     n     in     p.. 

2 

The  procedure  used  to  obtain  the  intial  consistent  estimators  was  to  begin  with  an 

identity  weighting  matrix,   use  a  few  iterations  to  obtain  "reasonable"  parameter  values, 

calculate     W     as  in  equation  (5.4),   and  then  minimize  to  get     a. 
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The  standard  errors  for  LS  and  IV  were  calculated  from  heteroskedasticity  consistent 
fromulae,   e.g.   as  given  in  White  (1982).     The  standard  errors  for  simulated  moment 
estimators  were  calculated  from  the  GMM  asymptotic  variance  estimator     (H'Z    H)     ,     where 

H  =  ay;.",c(x.)p.(a)/aa. 

''1=1       1     1 

A  preselection  process  was  used  to  choose  the  order  of  powers  of  the  predicted 

value  to  include  in  the  instruments.     Starting  at  the  second  order,  the  minimum  needed 

to  have  enough  moments  to  allow  estimation  of  distribution  paremieters,  the  order  was 

chosen  by  cross-validation  on  the  food  equation,  Gaussicin,  simulated  moment  estimator 

(SMO),  using  the  cross-validation  criteria  for  choice  of  instruments  suggested  in  Newey 

(1993b).     Inclusion  of  higher  order  powers  did  not  result  in  any  decrease  in  the 

cross-validation  criteria.     Consequently,   in  Tables  Three  and  Four  the  instrumental 

variables  were     (1,  x'jr,   (x'lr)  ).     In  Table  Five     exp(x'ir)«x      was  added  to  the 

instruments,  because  of  the  presence  of  the  covariates. 

The  number  of  Hermite  polynomial  terms  to  include  was  chosen  essentially  by  an 

upwards  testing  procedure,  applied  in  the  model  of  Table  Three.     Inclusion  of  a  third 

order  term  was  tried  in  each  case,   as  reported  in  Table  Three.     This  term  allows  for 

asymmetry  in  the  distribution  of     v.     If  it  was  statistically  significant,  a  fourth  order 

term  was  tried.     In  none  of  the  cases  was  this  term  significant,   so  only  results  for  the 

one,  third  order,  Hermite  polynomial  term  are  reported  in  the  tables. 

1/2 
For  each  estimator,  elasticities  at  the  quartiles,  the  estimate  of     cr  =  Var(v) 

and  the  estimator  of  the  coefficient     y     of  the  Hermite  polynomial  term,  as  well  as 

standard  errors  (in  parentheses  below  the  estimates)  are  reported.     The  (asymptotic) 

t-statistic  on  the  coefficient  of  inverse  expenditure  (t-stat)  and  the  over  identification 

(minimum  chi-square)  test  (Q)  statistic  for  the  simulated  moment  estimator  are  also 

reported.     The  t-statistic  is  particulau'ly  relevant  in  Table  Three  because  the  2SLS 

estimator  would  be  consistent  if  the  coefficient  on  inverse  expenditure  were  zero. 

The  degrees  of  freedom  of  the  overidentification  test  statistic  are     2     and     1 

respectively  for  SMO  and  SMI,  in  Tables  Three  and  Four,  and     8     and     7     respectively  in 
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Table  Five.     The  difference  between  these  statistics  for  SMO  and  SMI  is  a  one-degree  of 
freedom  chi-squared  test  of  the  Hermite  coefficient  being  zero. 

Even  though  the  IV  estimator  is  inconsistent,   it  gives  results  similar  to  the  SM 
estimator  in  a  number  of  cases.     When  the  share  denominator  is  allowed  to  be  measured 
with  error  there  are  larger  differences  between  IV  and  SM.     The  standard  errors  of  SM  are 
smaller  thain  those  of  IV,   which  is  consistent  with  the  Monte  Carlo  results  of  Section  4. 
There  are  large  differences  between  the  OLS  and  SM  estimators,   as  is  consistent  with  the 
presence  of  measurement  error.      It  is  interesting  to  note  that  the  elasticities  for 
transportation  go  down  rather  than  up,   unlike  linear  regression  with  measurement  error. 

In  comparing  Tables  3  and  4,   it  is  apparent  that  accounting  for  measurement  error  in 
the  denominator  leads  to  some  changes  in  the  results.     TTiere  is  more  nonlinearity  in  the 
food  equation  in  Table  4  than  in  Table  3.     The  prediction  error  standard  deviation     a-     is 
more  precisely  estimated  in  these  equations.     The  overidentification  test  statistics  are 
larger  in  Table  4.     Suprisingly,  the  estimated  standard  errors  in  Table  4  are  not  much 
larger  than  those  in  Table  3,   although  Table  4  is  a  levels  equation  that  is  sometimes 
thought  to  be  more  heteroskedastic  than  the  share  equation.     There  is  little  evidence  of 
nonnormality.      In  most  cases  SMO  is  quite  similar  to  SMI,   except  for  much  larger  standard 
errors. 

In  summary,   although  allowing  for  nonnormality  does  not  change  the  empirical 
results,  correcting  for  measurement  error  maikes  a  big  difference.     In  several  cases  the 
simulated  moments  estimator  is  quite  different  than  the  inconsistent  IV  estimator, 
suggesting  that  the  inconsistency  of  IV  estimator  may  not  be  uniformly  small. 
Furthermore,  the  simulated  moment  estimators  seem  quite  accurate,   having  small  standard 
errors.     These  results  illustrate  the  usefulness  of  using  simulated  moment  estimation  to 
correct  for  measurement  error,   while  allowing  some  flexibility  in  the  distribution  of  the 
prediction  error  to  asses  the  impact  of  allowing  for  nonnormality. 
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Appendix 

Proof  of  Theorem  4.1:     The  proof  proceeds  by  verifying  the  hypotheses  of  Theorem  5.1  of 
Newey  and  Powell  (1989).     Let  the  norm  for     9  =  O.g)     be     11611   =   lipil   +   llgll.     Note  that     6 
=  Sx5     is  compact  by     S     amd     &     compact.     For     S     simulations  let     Z     denote  the 

augmented  data  vector,  with     Z  =  (z,v  ,...,v  ).     Also,  let     p(Z,e)  = 

-1     S 
S    y     ,H(z,fi,v  )P(v  ,y).     Note  that  for     p_(v)  =  g_(v)/cp(v),     it  follows  by  Assumptions 
^^=1  s       s  u  u 

4.2  and  4.3  that  by  the  triangle  inequality 

(A.l)  llpCZ.e^)!!   £  J:^^^IIH(z,Pq,v^)II|Pq(v^)|/S  s  <j:^f^M(z.v^)[(j(v^)v(v^)rVs>llgQll, 


(A.2)  {E[llp(Z,e_)ll^''^]>^''^^"'^^  £  C-r^,<E[m(z,v  )[u(v  )v)(v  )]  h'^''^]}^^^^''^^/S 

0  *^=1  s         s        s 

=  C.'(E[J^M(z,v)^*^w(v)"^"%(v)"^"^dvl}>^^^^''^^  <  CO. 


It  follows  similarly  to  equation  A.l  that  that  for     6,   6  6  8, 


(A.3)  iip(z,e)-p(z,e)ii  :£  c«{j;^^^M(z,v^)[w(v^)^(v^)l  Vs>iie-eii, 


so  that  Assumption  5.1  of  Newey  and  Powell  (1989)  follows  by  eq.   (A.2).     Assumptions  5.2 
and  5.3  then  follow  by  Assumptions  4.4  and  4.5.     Furthermore,  by  the  fact  that 
E[p(Z,e)|x]  =  E[p(z,^,g)|x]     for  an  unbiased  simulator,  as  noted  in  the  text,  Newey  amd 
Powell's  (1989)  Assumption  3.1  holds  by  Assumption  2.1.     The  conclusion  then  follows  by 
the  conclusion  of  Theorem  5.1  of  Newey  and  Powell  (1989).     QED. 

Proof  of  Corollary  4.2:     The  proof  proceeds  by  verifying  the  hypotheses  of  Theorem  4.1. 
Assumption  4.1  follows  by  hypothesis  and  the  Arzela  theorem,  which  gives  compactness  of 
W     in  the  sup  norm.     Assumption  4.3  follows  with     w(v)  =  1     by     (p(v)     bounded  away  from 
zero  and  Assumption  4.6,  vi).     Assumption  4.4  follows  by  a  Weirstrass  approximation  of 
g(v)/^p(v)     by     PAy).  The  proof  then  follows  by  the  conclusion  of  Theorem  4.1.     QED. 
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Table    One:   Monte    Carlo    Results 


^ 

'2 

^3 

Bias 

SE 

RMSE 

B  i  as 

SE 

RMSE 

Bias 

SE 

RMSE 

OLS 

-1.  10 

.25 

1  .  13 

.  67 

.  13 

.  67 

.85 

.  12 

.  86 

IV 

-    .30 

3.31 

3.32 

.  13 

1.25 

1  .  26 

.47 

2.01 

2.  07 

SM 

-    .09 

2.27 

2.27 

.  07 

1.05 

1  .  05 

.31 

1.62 

1.  65 

Table  Two:     Some  Sample  Statistics 

25th  50th  75th 

Income  Quartiles  3373  4574  6417 

Sample  standard  error  of   log  of  expenditure        .51 

Standard  error  of  predicted  .25 

Standard  error  of  residual  .45 

R-squared  .23 

Table  Three:      Elasticity  Estimates  for  Share  Equations 


t-stat  Q 

4.52 

.  47 

3.67  6.36 

.04  6.08 


Food 

25th 

50th 

75th 

a- 

7 

LS 

.72 

.66 

.59 

(.02) 

(.02) 

(.03) 

2SLS 

.82 

.78 

.74 

(.05) 

(.04) 

(.06) 

SMO 

.82 

.78 

.74 

.61 

(.04) 

(.04) 

(.06) 

(.08) 

SMI 

.84 

.78 

.71 

.31 

-.01 

(.20) 

(.09) 

(.36) 

(1.49) 

(  .05) 
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Clothing 

25th 

50th 

75th 

<r 

r 

t-stat 

LS 

1.21 
(.05) 

1.08 
(.04) 

.97 
(.05) 

18.66 

2SLS 

1.61 
(.12) 

1.42 
(.09) 

1.30 
(.10) 

2.25 

SMO 

1.63 

1.40 

1.26 

.02 

.56 

(.20) 

(.10) 

(.18) 

(.38) 

SMI 

1.62 

1.28 

1.07 

-.0009 

.15 

.20 

(.46) 

(.30) 

(.56) 

(.0018) 

(.11) 

6.  11 


1.99 


25th 


50th 


75th 


Tr  an  s  p  ortation 


t-stat 


LS        1.28 

1.44 

1.50 

11.19 

(.07) 

(.06) 

(.07) 

2SLS      .99 

1.06 

1.12 

1.00 

(.08) 

(.08) 

(.12) 

SMO     1.02 

1.01 

1.01 

1.71 

.04 

(.07) 

(.06) 

(.06) 

(2.01) 

SMI      1.40 

.98 

.63 

.10 

.028 

3.10 

(.27) 

(.07) 

(.18) 

(.07) 
Recreation 

(.018) 

25th 

50th 

75th 

a- 

7 

t-stat 

LS        1.40 

1.20 

1.06 

16.59 

(.07) 

(.06) 

(.07) 

2SLS  1.70 

1.31 

1.07 

11.45 

(.15) 

(.12) 

(.15) 

SMO    2.97 

1.33 

.39 

.02 

6.48 

(.63) 

(.12) 

(.48) 

(.34) 

SMI     6.98 

2.32 

1.28 

.71 

.024 

5.58 

(4.40) 

(.36) 

(.12) 

(.18) 

( .  002) 

11.28 


7.54 


.09 
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Table  Four:     Elasticity  Estimates  for  Level  Equations 

Food 
25th  50th  75th  a  r  t-stat  Q 

.31 


3.34 


LS           .68 

.63 

.57 

.04 

.03 

.03 

2SLS      .90 

.80 

.70 

(.08) 

(.05) 

(.05) 

SMO        .98 

.81 

.63 

.35 

(.08) 

(.04) 

(.05) 

(.02) 

SMI      1.21 

.83 

.46 

.24 

.008 

(.21) 

(.06) 

(.13) 

(.06) 

(.005) 

LS       3.14 

1.95 

1.48 

(1.06) 

(.19) 

(.09) 

2SLS     .23 

.93 

1.53 

(.65) 

(.12) 

(.53) 

SMO     1.16 

.94 

.76 

.64 

(.08) 

(.05) 

(.04) 

(.04) 

SMI     1.25 

.97 

.74 

.58 

.004 

(.59) 

(.20) 

(.09) 

(.27) 

(.021) 

12.4  18.16 


5.09  16.42 


CI  othing 

25th 

50th 

75th 

a- 

r 

t-stat 

LS 

1.40 
(.11) 

1.11 
(.06) 

.89 

(.04) 

53.95 

2SLS 

2.04 
(.26) 

1.50 
(.12) 

1.21 
(.13) 

4.92 

SMO 

2.07 

1.36 

.96 

.34 

39.04 

(.21) 

(.08) 

(.07) 

(.02) 

SMI 

2.14 

1.37 

.93 

.33 

.001 

3.19 

(.58) 

(.11) 

(.18) 

(.09) 

(.009) 

Transportation 
25th  50th  75th  <r  7  t-stat 

1.68 


2.04 


17.15 


17.13 


38.27  10.63 


.69  10.52 
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Recreation. 

25th 

50th 

75th 

cr 

y 

t-stat 

LS        1.74 

1.26 

.95 

42.87 

(.19) 

(.09) 

(.05) 

2SLS  1.84 

1.32 

1.01 

14.85 

(.28) 

(.16) 

(.15) 

SMO    2.35 

1.44 

.95 

.36 

48.62 

(.26) 

(.09) 

(.07) 

(.03) 

SMI     7.85 

1.93 

.33 

.25 

.024 

15.88 

(4.76) 

(.34) 

(.15) 

(.02) 

( . 005) 

33.23 


21.83 


Table  Five:  Elasticity  Estimates  for  Level  Equations  with  Covariates 

Food 

25th             50th  75th              cr                  y               t-stat             Q 

LS          .72               .66  .58                                                     1.90 

(.05)            (.03)  (.03) 

2SLS      .97                .85  .74                                                       4.40 

(.09)            (.06)  (.06) 

SMO     1.00               .86  .73               .33                                 7.14           43.51 

(.08)            (.05)  (.05)           (.02) 

SMI      1.25               .90  .58              .21               .009          2.47           41.59 

(.29)            (.07)  (.15)           (.07)            (.007) 


Clothing 

25th 

50th 

75th 

<r 

y 

t-stat 

LS        1.35 

1.08 

.88 

32.76 

(.11) 

(.06) 

(.04) 

2SLS  2.01 

1.50 

1.22 

4.22 

(.31) 

(.14) 

(.15) 

SMO     1 .  94 

1.31 

.94 

.33 

29.32 

(.22) 

(.09) 

(.08) 

(.02) 

SMI      2.01 

1.33 

.92 

.31 

.002 

1.59 

(.73) 

(.14) 

(.23) 

(.11) 

(.011) 

43.23 


42.95 


-  28  - 


Transportation 


25th 

50th 

75th 

<r 

r 

LS 

2.54 

1.83 

1.50 

(.63) 

(.14) 

(.08) 

2SLS 

.05 

.70 

1.38 

(.55) 

(.16) 

(.45) 

SMO 

1.07 

.87 

.69 

.61 

(.09) 

(.06) 

(.04) 

(.04) 

SMI 

1.86 

1.  13 

.61 

.40 

.016 

(.87) 

(.26) 

(.07) 

(.08) 

(.009) 

t-stat  Q 

1.30 

3.09 

28.01  30.12 

2.61  28.92 


Recreation 

25th 

50th 

75th 

cr 

r 

t-stat 

LS 

1.73 
(.19) 

1.25 
(.09) 

.95 

(.05) 

35.96 

2SLS 

1.78 
(.30) 

1.31 
(.  17) 

1.01 
(.16) 

10.69 

SMO 

2.40 

1.41 

.88 

.31 

37.86 

(.32) 

(.  11) 

(.08) 

(.03) 

SMI 

8.38 

1.91 

.18 

.21 

.021 

10.91 

5.45 

(.32) 

(.23) 

(.03) 

(.005) 

54.23 


44.88 
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